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Abstract: Recent results from ATLAS and CMS indicating a Higgs mass around 125 GeV, 
further constrain already highly constrained supersymmetric models such as pMSSM or 
CMSSM/mSUGRA. Finding potentially discoverable and non-excluded regions of model 
parameter space is becoming increasingly difficult. Several groups have invested large effort 
in studing the consequences of Higgs mass bounds, upper limits on rare B-meson decays, 
and limits on relic dark matter density on constrained models, aiming on predicting su- 
perpartner masses, and establishing likelihood of SUSY models compared to that of the 
Standard Model vis-avis experimental data. In this paper a general framework for efficient 
search for discoverable, non-excluded regions of different SUSY spaces giving specific ex- 
perimental signature of interest is presented. The method employs an improved Markov 
Chain Monte Carlo scheme exploiting an iteratively updated likelihood function to guide 
search for viable models. Existing experimental and theoretical bounds as well as the LHC 
discovery potential are taken into account. This includes recent bounds on relic dark mat- 
ter density, the Higgs sector and rare S-mesons decays. A clustering algorithm is applied to 
classify selected models according to expected phenomenology enabling automated choice 
of experimental benchmarks and regions to be used for optimizing searches. The aim is 
to provide experimentalist with a viable tool helping to target experimental signatures to 
search for, once a class of models of interest is established. As an example a search for 
viable CMSSM models with r-lepton signatures observable with the 2012 LHC data set is 
presented. In the search 284496 unique models were probed. From these, nine reference 
benchmark points covering different ranges of phenomenological observables at the LHC 
were selected. 
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1 Introduction 



Supersymmetry (SUSY) may alleviate many of the problems associated with the Standard 
Model of particle physics (SM) if the mass of the superpartners lays close to the TeV- 
scale [1], [2]. Furthermore, it provides a natural dark matter candidate in the form of 
the Lightest Supersymmetric Particle (LSP), if i?-parity is conserved [3]. However, even 
the simplest SUSY extension of the SM, the so called Minimal Supersymmetric Standard 
Model (MSSM), introduces over 100 new free parameters making them very difficult to 
experimentally constrain. On the other hand a large part of the MSSM parameter space is 
already ruled out, as it would lead to unobserved phenomena like non- conservation of lepton 
numbers, flavour changing neutral currents or large CP violation [4]. It is therefore common 
practice to look at constrained models that assume a partial unification of parameters at 
some high energy scale and where the dynamics of the high energy theory ensures more 
viable phenomenologies [5]. The minimal SUper GRAvity model (mSUGRA) [6] is an 
example of such a constrained model where the SUSY parameters are assumed to unify at 
the GUT scale into five universal parameters, a common scalar mass mo, a common gaugino 
mass the ratio between the SUSY Higgs vacuum expectation values tan/3, a common 

trilinear Higgs-sfermion coupling Aq, and the sign of the Higgsino mass parameter /x. In 
the "lighter" version of it, the so called constrained MSSM (CMSSM) [7-9] gravitino mass 
is not forced to unify at the same scale as other gaugino masses. In NUHM (Non-Universal 
Higgs Masses) [10, 11] models, masses of Higgs bosons do not unify with sfermions to the 
common ttiq. 

ATLAS and CMS experimental searches for SUSY usually present results only in two- 
dimensional slices of the parameter space of some simplified model assuming fixed values 
for other parameters [12, 13]. Due to complicated dependence of physical masses and thus 
experimental signatures on all the model parameters, it is easy to leave specific corners of 
the model space unexplored in such an approach, leaving out regions where experimental 
search may have large discovery potential. This has lead several theoretical groups [14, 15] 
to reinterpret experimental searches in different regions of parameter space with help of 
simplified simulators of detector response like DELPHES [16] or PGS [17]. This approach 
can be relatively reliable for moderately simple experimental signatures involving jets and 
missing transverse energy (^t), but it cannot be trusted for more difficult experimental 
objects like photons or tau leptons. 

Bayesian parameter inference and MCMC has been successfully employed to find the most 
viable region of the full parameter spaces, based on requirements that the models should 
be in accordance with recent experimental constraints [14, 15], including these on the Higgs 
boson mass and rare 5-mesons decays. While such scans provide a more complete picture 
of the still allowed regions of parameter space, few take into account the most relevant 
experimental signatures to search for or whether these parameter space regions are within 
experimental reach. This poses difficulties for experimentalists when trying to make direct 
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use of the results. 



In this paper, a general MCMC-based framework for determining the part of non-excluded 
model parameter space where a given experimental signature can be observed is presented. 
By adding a signature specific discoverability parameter to the set of current experimen- 
tal constraints, the interesting regions of the parameter space are found. Models from 
these regions are then clustered according to phenomenology with use of g-means [18] 
algorithm to enable an automatized construction of reference points for optimizing exper- 
imental searches. This step distinguishes our approach from existing similar frameworks 
for example [19]. The procedure is applicable to a wide range of signatures and models. 
In order to provide a proof of concept, a concrete example defining a non-excluded part 
of CMSSM parameter space which could be discoverable with r-leptons in the 2012 LHC 
data is outlined. 

The paper is structured as follows: section 2 discusses the publicly available software tools 
used to calculate low energy CMSSM observables, scan and clustering algorithms, as well 
as the specific constraints and phenomenological parameters used. Section 3 describes 
the results of the scan and the phenomenological reference points constructed. Appendix 5 
explains the details of the algorithm implementation and presents cross-checks of the effects 
of experimental constraints with other existing results. In Section 4, a summary and 
comments on the procedure are provided. 



2 Algorithms and Tools 

Experimental constraints on dark matter relic density fth^ as well as on rare processes 
such as Bg iJifJi and 6 ^ 5 + 7 set strict bounds on the parameter space of CMSSM 
(see for example [15]). Furthermore, if the recent indications of a Higgs boson with mass 
around 125 GeV [20] turns out to be correct, the fraction of viable models within current 
experimental reach is extremely small. This renders simple uniform scans highly inefficient. 
A rough random scan made to explore the parameter dependence in CMSSM, gave a 
fraction of 10~^ models in accordance with current experimental constraints. Therefore, 
more advanced techniques need to be employed to get a representative picture of the 
discoverable and non-excluded regions of parameter space in an efficent way. 

The approach used in this paper is to employ a likelihood distribution P, that reflects 
how well models fit the data, to perform a guided random walk through parameter space 
using Markov Chain Monte Carlo (MCMC) [21]. This increases the search efficiency as 
the parameter space is sampled according to the distribution P thus less time is spent 
sampling low likelihood regions. In this work an adaptive MCMC is implemented, where 
the likelihood map is based on the compatibility of low energy properties of CMSSM models 
with experimental and theoretical constraints. These properties are calculated using several 
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publicly available software tools. 
2.1 Software Tools 

A series of publicly available software tools is used to calculate the low energy parameters 
needed to check experimental constraints on the SUSY models, and to construct the like- 
lihood map used in the MCMC scan. Parameters are passed between the different tools 
using the SLHA-interface [22]. The tools are called in sequence starting with the least 
computationally costly, and after each step the likelihood is updated based on the avail- 
able parameters. Each component (z) of the likelihood is constructed to have a maximal 
value of so that the likelihood always decreases as the chain progresses. This makes 

it possible to check for rejection after every step in the tool sequence, and enables early 
termination of the calculations for a large fraction of low likelihood models. 

In the first step, ISAJET with isaRED [23] is used to run the GUT scale universal pa- 
rameters down to the electroweak scale, calculate T {Bg iifi)^ and to check weather the 
models are allowed by several theoretical constraints, including requirements of a x? LSP 
and correct electroweak symmetry. In the next step, FeynHiggs [24] and HiggsBounds [25] 
are used to recalculate and check if the model fulfills experimental constraints on the Higgs 
sector. Afterward, the dark matter relic density, fi/i^, and F (6 ^ 5 + 7) is calculated us- 
ing darkSUSY [26], which also checks against experimental constraints on sparticle masses 
from LEP Ap and Z- width (see for example [27, 28]). Finally, 1000 pp signal events at 
= 8 TeV are generated using Pythia [29] in order to get a leading order estimate of the 
SUSY cross-section, ctlo? and to calculate the fraction of events (F^-, Fjg^ . . .) containing 
respectively at least one r, e, /x, jet with pseudorapidity in the central part of the detec- 
tor, 1 77 1 < 2.5, and sufficiently large momentum in the plane perpendicular to the beam 
axis, pt > 20 GeV, and the average number of these objects per SUSY event (n^, rijet • • •)• 
For each of these objects, the average pT is calculated for the two ones with the high- 
est transverse momentum. The average missing transverse energy per event, IJti is also 
calculated. 

The software tools and their employment are summarized in Table 1. 

Table 1: Software tools and resulting information employed in this work. Average values 
from Pythia are for final states objects with I77I < 2.5 and pT > 20GeV. 



Tool 


Information used 


ISAJET 7.81 & isaRED 


SUSY masses, F (Bg ijlij) 


FeynHiggs 2.9.2 & HiggsBounds 3.7.0 


Higgs sector 


darkSUSY 5.0.5 


r(6^ s + 7) 


Pythia 8.162 


crLO,r, (n,pTi,PT2) for r, e,/i,jet,(^/T) 
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2.2 Likelihood Map and Experimental Constraints 



The hkehhood map P used to explore CMSSM parameter space is constructed by com- 
bining a hkehhood Pexp based on experimental and theoretical constraints with an ad-hoc 
likelihood related to the expected number of events with tau leptons, Pr- Here Pr is based 
on the probability of producing observable r-leptons with 15/ fb of the LHC data collected 
in 2012. Pr can be easily replaced by another likelihood function related to observability of 
any signal of interest. The likelihoods are normalized so that each individual contribution 
Pi has a maximal value max(P^) = 1. Thus, the full likelihood becomes: 



where Pi are the likelihoods related to experimental limits and theoretical constraints. 
Some of Pi are either or 1 as specified in table 2. These include most of theoretical 
constraints, limits checked internally by the software tools used as well as the new experi- 
mental limits on Bg fijJ^ limits [30], F (Bs ijlij) < 4.5 • 10~^. For other experimentally 
measured quantities Gaussian errors are assumed and the resulting likelihoods are contin- 
uous. For example Gaussian distributions around the central experimental values are used 
for F (6 ^ 5 + 7) and the Higgs mass, while for the relic density a uniform distribution is 
chosen with a Gaussian tail above the best observational value. The latter accepts models 
which give the relic density lower than the WMAP result [31], allowing for other unknown 
sources except of CMSSM neutralinos to contribute to the relic density. The central values 
and standard deviations used are nh^ = 0.1126 ± 0.0036 for the relic density [31] and 
F (6 ^ 5 + 7) = (3.55 ± 0.42) • 10~^ [32] for the charmless 6-quark decay, where a theoret- 
ical uncertainty ath = ±0.33 • 10~^ is assumed [33]. The central value for the Higgs mass 
rriho = (125 =b 1) GeV is assumed in agreement with the recent ATLAS and CMS results. 
The experimental and theoretical constraints are summarized in table 2. The 2011 and 
2012 ATLAS and CMS results of direct searches for SUSY in i?-parity conserving channels 
are not included in the present work. The reason for it is two-fold. Firstly, the Higgs mass 
of ruho = (125 =b 1) GeV translates in CMSSM into rather high sparticle masses, where the 
present direct searches results have little effect. Secondly, being members of the ATLAS 
Collaboration we know how unreliable the translation of present limits into other region 
of parameter space can be, whenever one uses only approximate modelling of the detector 
response. We thus prefer to use this opportunity to provide tools to experimenters so that 
they can chose somewhat more interesting regions of SUSY parameter space to present 
their results. 

The discoverability likelihood Pr is constructed as 



= Pexp • Pt and 




(2.1) 



Pk-Pd. 



(2.2) 
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Table 2: Experimental and theoretical constraints used and the associated likelihoods 



Constraints 


Likelihoods Pi 


Values 


X? LSP, Correct EWSB, No tachyons . . . 
Sparticle masses, Ap, Z-width 
OK Higgs sector 
Branching fraction Bg ijlijl 


OK 
OK 
OK 
OK 


1 Not OK: 
1 Not OK: 
1 Not OK: 
1 Not OK: 


ISAJET 7.81 
darkSUSY 5.0.5 
HiggsBounds 3.7.0 

V{Bs ^ fifj) < 4.5 -10-^ 


Relic density Q.h^ 
Branching fraction 6^5 + 7 
Higgs mass tuhq 




[fJ^n^an] = [0.1126,0.0036] 
[/ibsg,absg] = [3.55,0.42] .10-^ 
[fiho^cTho] - [125,1] GeV 



with a discrete part Pk requiring that either the lightest chargino or the second lightest 
neutralino is heavier than the stau. This is motivated by the fact that in CMSSM r-leptons 
are mainly produced through the decay of either X2 xf- Continuous likelihood Pd is 
a Poisson discoverability measure constructed from the sum of likelihoods for observing a 
given number of tau events, A^^ > 1, given the expected number of events containing at 
least one r, {Nr) = F^- • £ • aLO- 

Pd- nNr\{Nr)) , P(Ar.|(ArO)^ ^^-^''^'"P^"^^-^] . (2.3) 

Leading order SUSY cross-section glq in VP collisions at 8 TeV center-of-mass energy, 
luminosity of 15/ fb and the fraction of events containing at least one T,r^, as found from 
Pythia, are used to calculate {N^) above. 

The search range in CMSSM parameter space follows the suggestion in [34]. The ranges 
for mo, ^1/25 ^0 and tan/3 are presented in table 3. Only positive values of sign(/x) are 

considered where the muon anomalous magnetic moment, {g — 2)^, constraint seem to be 
easier to satisfy [15], but this constraint is not included in the present work. A fixed value 
for mtop = 173 GeV was taken. 

The lower bounds on the universal masses and on tan/3 in table 3 stem from LEP [28, 35] 
bounds, while the upper limits are chosen on the basis of naturalness for the masses mg 
and mi/2 and perturbativity of the Yukawa couplings for tan /3. The range for the trilinear 
coupling Aq is extended compared to [34], since the Higgs mass has a quadratic dependence 
on Aq [36] allowing for higher ruho at large values of l^ol. 

It is important to note that the more realistic the likelihood, the more computationally 
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Table 3: Search ranges used for the CMSSM MCMC scans. 



Parameter 



Range 



mi/2 
Ao 

tan P 
sign(/i) 



[60,3000] GeV 
[60,3000] GeV 
[-5000,5000] GeV 
[2,60] 
+1 



efficient is the search for interesting models. However, the set of models found does not 
depend on fine details of the likelihood function used, as we accept points in a range of 
likelihood. Our goal is to find a set of interesting models fulfilling latest experimental 
constraints, in order to guide further experimental searches. We do not intend to make 
any statistically quantified decision on which of the selected models are more likely than 
others. 



2.3 MCMC Algorithm 

The MCMC method used here is a Metropolis-Hastings algorithm [37] where, given a point 
X = {xi,X2 . . .X]j} in a D-dimensional parameter space, a proposal distribution, (5(y|x), 
is used to sample a new point y. The proposal distribution is related to the likelihood P 
of the new point y being "interesting" from the point of view of requirements described in 
section 2.2. The new point is accepted randomly with a probability given by 



If y is accepted, it is added to the chain and the next point is sampled starting from y. 
If it is not accepted, the chain remains at x and the process is repeated as illustrated in 
figure 1. The asymptotic distribution of likelihoods calculated for the resulting chain of 
points is the desired likelihood distribution P. 

In order to efficiently map possible high likelihood regions of the parameter space which 
are separated by large regions of low likelihood a regional adaptive MCMC algorithm 
similar to [38] has been implemented. The algorithm approximates the target likelihood 
distribution P as a mixture of normalized multivariate Gaussian distributions and uses this 
approximation as a basis for a proposal Q{y\x), as explained in the following section 2.3.1. 
This proposal is used to guide multiple MCMC search chains in parallel and it is iteratively 
updated according to the resulting selected sample of points. 




(2.4) 
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a) A proposal distribution Q(y|x) used to sample new 
points (yi, y2) from a point x. These points are either 
accepted (y2) or rejected (yi) depending on the ratio 
between the underlying likelihood P and Q as given 
by a (2.4) 
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b) Example MCMC random walk seeking out narrow 
high likelihood regions A, from an initial starting point 
in a low (A) or zero (C) likelihood region. Th grid of 
dots is shown to illustrate how such regions can be 
missed by uniform grid-based scans. 



Figure 1: Illustrations of the standard MCMC sampling method, la, and a typical MCMC 
random walk, lb 



2.3.1 Adaptive Multi Chain Monte Carlo 



An initial estimate for the proposal is constructed by uniformly sampling the space such 
that all separated regions where the likelihood P is high are covered. The points of CMSSM 
parameter space chosen for the initial sample are required to pass all discrete cuts and to 
give experimentally measured physical variables within a reasonable range of the experi- 
mentally preferred values, see table 2 for details. Sampled points in CMSSM parameter 
space are weighted according to their likelihood and clustered using k-means algorithm, 
see section 2.4. For each cluster of points we estimate the shape in CMSSM parameter 
space by calculating the the weighted mean /x vector and and covariance matrix Xl 

The number of clusters corresponds to the number of normalized Gaussian distributions 
(normal mixture) that is to be used to approximate the likelihood distribution P, as ex- 
plained below. 

Efficient sampling of high likelihood regions in the parameter space separated by large 
regions of low likelihood required a modification of the proposal in order to introduce a 
small fraction of large jumps [38, 39]. To this end a global proposal term ^'^(y), has 
been introduced in addition to the standard local one, ^^(ylx). The summary proposal 
distribution becomes: 
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Q(y|x) = /3gL(y|x) + (1 - p)qG{y) , 



(2.5) 



Here /3 is a mixing parameter relating the global and the local proposal terms, explained 
further. 

The global proposal qciy) is taken as the weighted mixture of a set of m multivariate 
normal distributions, AA, as in [40]. The number of multivariate normal distributions used 
is chosen high enough to describe the main features of the target distriubtion (number 
of seperated regions, curvature, etc), while each weight factor Wi in the formula below is 
estimated by summing up the total likelihood over CMSSM parameter space points in a 
cluster i. 



QG{y) = 2^ WiN{y\fii, , A/ (y l/x, E) = ^ — , , (2.6) 

i=i V(2^) 1^1 

Here /x^ is the vector of the means of the z'th normal distribution, while ^G,i is the co- 
variance matrix of the z'th component of the mixture. The local proposal g7^(y|x) is taken 
as a normal distribution with mean, x, and the covariance, Ti^^i^ characterizing the closest 
cluster in the parameter space. An euclidean distance measure is used and each parameter 
is scaled so that the search ranges defined in table 3 vary from to 1. The local proposal 
covariance is chosen so that: Ti^^i = aiTic^i , i G {1,2, . . .m}, where ai is a parameter 
adapted such that the local acceptance rate for points in the parameter space region within 
the cluster i is between 0.05 and 0.15. The rather low acceptance rate is chosen because 
the hierarchical nature of the likelihood calculation yields higher computational speed for 
low acceptance rates. Thus our optimal acceptance rate is probably lower than that of 0.23 
found in [41]. The acceptance probability for stepping from a given point x to a new point 
y is then given as: 



P(y) [/3gL(x|y) + (l-/3)gG(x)] \ 
'P(x)[/3g,.(y|x) + (l-/3)gG(y)]; ' ^ 

The search chains are started from random CMSSM parameter space points in the weighted 
sample and followed independently. After a given number of steps data is re-clustered 
and the proposals are updated, taking into account the new sampled parameter space 
points, where the new points are weighted according to the estimated likelihood. A fraction 
of points in each chain is discarded to ensure sampling from the equilibrium part of P 
distribution. This is done by requiring a certain likelihood threshold Pmin for the first 
relevant point in each chain [21, Ch. 1.11.4]. The implementation details are described in 
the Appendix 5. 



a(y|x) 



mm 



-9- 



2.4 Clustering Algorithm 



A suitable clustering algorithm based on k -means [42] has been devised in order to cluster 
hkelihood-weighted points in CMSSM parameter space. The role of clustering is two-fold. 
Firstly clusters are needed to calculate the approximate Gaussian distributions used in 
the proposal described in the previous section 2.3.1. Secondly, after the choice of the 
set of high-likelihood model-points in CMSSM parameter space is made, these points are 
clustered according to the different experimental signatures they are expected to exhibit 
in the detectors at the LHC. 

The k -means algorithm defines clusters in the parameter space by assigning each space 
point to the closest centroid, (C). The algorithm is initialized by choosing at random k 
points in the parameter space as cluster centers C. Next, each Ci is refined as the average 
of the points near to it and points are reassigned to the new C, and the procedure is 
repeated until it converges to a set of stable Cs. 

A number of improvement to the standard k -means [42] procedure was employed in this 
article in order to cure several undesirable features: arbitrary choice of the number of 
clusters, fc, slow convergence and sensitivity to the "outlying" parameter space points. 
To increase the speed of the algorithm, a maximum number of iterations and a minimum 
improvement between iterations was set for the centroids positions refinement. Furthermore 
the speed to find the nearest neighboring points was improved using a multidimensional 
binary search tree (fcd-tree) as in [43] resulting in the parameter space being searched for 
the points closest to a given centroid along alternating dimensions defined by the kd-tiee 
branches. The speed was further gained by grouping these branches such as to minimize 
the close points population [44]. 

The random first guess of centroids positions was improved as suggested in [45], {k- 
means++ algorithm). Here the probability of picking a new point was weighted by the 
square distance to the closest already picked point. This guess reduced as well the average 
number of iterations to achieve convergence. 

To determine the number of clusters, k, ^-means-algorithm [18] was employed. It started 
with applying k-means for k — 2^ thus dividing the parameter space into two sub-clusters. 
Then a statistical test to verify if the likelihood distributions of both clusters can be 
described by a single Gaussian was performed. If the test rejected the single Gaussian 
distribution hypothesis, the procedure was repeated recursively for each of the new clus- 
ters, otherwise recursion was terminated. The statistical test was done by means of the 
dimensional Anderson-Darling normality measure [46]. The distances between the points 
and a plane that separated clusters and perpendicular to the vector between the two cen- 
troids were subjected to the measure above. As the clustering could be sensitive both to 
outliers and to the random positioning of initial centroids, the whole splitting procedure 
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was iterated riavg times and the average number of clusters (k) was noted. Next, the cluster 
with k closest to {k) was picked. If there were several equidistant clusters, one of these 
was picked at random. In the final step the obtained centroids were subjected to the k- 
means clustering one more time to ensure that a stable configuration has been found. The 
k-means was repeated a very large number of times in the clustering procedure. Thus the 
speed optimizations described above played an important role in ensuring the algorithm 
was usable for this work. 



3 CMSSM with r Signatures 
3.1 Results 

All viable models found have large negative values of Aq in common, but otherwise span 
a relatively large range of sparticle masses and values of tan/3. The ranges for the mean 
$rj- and pt for leading jet and r are shown in the CMSSM mass planes mg — ^1/2 and 
Aq — tan /3 in figure 2 and two dimensional likelihood distributions are shown in figure 3. 
The distributions are constructed by binning the models into A^bins — 100 bins along each 
dimension, where the likelihood of each bin is approximated by the number of models 
contained. The effects of different constraints separately are discussed in the Appendix 5. 
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Pt(ti) Prijeti) $T 



Figure 2: Average value per bin for mean $rj. and pt foi" leading jet and r shown in the 
CMSSM mass plane mo — (above) and Aq — tan/3 plane (below) 
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Figure 3: Marginalized likelihood maps for different planes in CMSSM space. 
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The discoverability likelihood constrains the SUSY r production cross-section to not be too 
small. This sets upper bounds on how large the gaugino and scalar masses can be since 
the production cross-section falls sharply as the masses of colored sparticles grow. The 
cross-section has as well a slight Aq dependence which allows for higher masses at higher 
negative values of Aq. 




-5000 -4000 -3000 -2000 -5000 -4000 -3000 -2000 -5000 -4000 -3000 -2000 

Aq [GeV] [GeV] [GeV] 

Number of r events SUSY cross section fraction of r events 



Figure 4: Average value per bin for number of r events, SUSY cross-section, and the 
fraction of r events, in the mass plane mo — mi/2 (above) and Aq — tan/3 (below) 

3.2 Phenomenology and Reference Points 

The relatively wide range of values for SUSY masses and values of tan /3 found leads to a 
wide range of values of phenomenological properties such as average ^7^, the average missing 
energy per SUSY event, Pr('7"i),Pr(jeti), the average pT of the leading r and the leading 
jet, and nr,njet, the average number of r's/jets per SUSY event, see 2. 

Table 4: Range and best fit value for phenomenological parameters. 



$T [GeV] 
min cent max 


rijet 

min cent max 


jeti(pT) [GeV] 
min cent max 


min cent max 


ri(pT) [GeV] 
min cent max 


1 386 523 


0.0 2.5 4.5 


55 466 857 


0.0 0.6 2.0 


55 151 372 



The pt values for the leading jet and r lepton obviously tend to be higher for high sparticle 
masses, since higher masses in CMSSM lead to higher mass splittings between the sfermions 
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and the LSP. The pxs also become larger with Aq closer to and for high tan /3 values. 
The missing energy on the other hand tends to become larger at smaller scalar masses 
and increasing gaugino mass. The increase in ^/t with mi/2 is likely due to increasing 
neutralino mass. These dependencies are illustrated in figure 2. 

The average number of rs per SUSY events is mostly due to the branching fraction in to 
TS as it can be seen comparing figure 5 and 4. One tau with high pT per event is produced 
on average. The SUSY branching fraction to rs is largest at low values of tan/3 and mo- 
At least one high pt jet is expected in almost every event. The average numbers of jets 
increases with mo, tan/3 and \Ao\. 




-5000 -4000 -3000 -2000 -5000 -4000 -3000 -2000 

Aq [GeV] [GeV] 

rir rijet 



Figure 5: Average value per bin for the mean number of rs and jets per SUSY event 
shown in the mo — mi/2 plane (above) and Aq — tan/3 plane (below) 

In order to construct reference benchmark models that cover these different phenomeno- 

logical properties the sample was clustered according to the phenomenological observables: 

{|;2^,njet,PT(jeti),n^,pr(Ti)}, 

details are described in the Appendix 5. 

With these, nine phenomenological clusters shown in table 5 were found. The SUSY and 
model related parameters for these clusters are shown in table 6. The centroids of the 
clusters can be regarded as reference (benchmark) points. As it can be seen in table 5, 
around 160 events with high energetic taus in the central part of the detector are expected 
for the highest cross-section reference point, this is for 15/ fb of 8 TeV CMS energy at the 
LHC. Even if predicted 2012 luminosity per experiment is around 25/ fb, this number of 
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events might not be enough to detect the signal. Exploration of the other reference points 
might have to wait until the LHC 13 TeV operations. 

Various graphical projections of the clusters are shown in figures 6 and 7. It is clear from 
figures 6 that one finds viable models lying in the tails of clusters, far away from centroid 
positions. These models exhibit either low jet activity in the central part of the detector, 
but produce high pt tau leptons, or have low number of high pt jets (monojets) and low 
momentum taus. We have not investigated these models further yet, but it is clear that 
standard LHC SUSY searches assuming presence of high pt jets for triggering purpose 
might fail for such models. 

Table 8 gives branching fractions of the lightest Higgs boson for the reference points, 
compared to the SM Higgs ones. Alas, the branching fractions do not differ by more than 
5% from SM values for these models. 

Table 5: Phenomenological parameters of clusters found. The first two columns are the 
cluster index, id, (matches id in table 6) and number of model-points in the cluster, n. For 
each parameter the cluster centroid value, cent, is listed along with the minimum, min, 
and maximum, max, for the cluster. Centroid values can be regarded as reference values 
characterizing given experimental phenomenology. 



id n 


$T [GeV] 
min cent max 


rijet 

min cent max 


jeti(pT) [GeV] 
min cent max 


min cent max 


ri(pT) [GeV] 
min cent max 


1 17502 


61.0 384.9 467.2 


0.8 2.8 3.9 


159.9 424.9 581.1 


0.3 0.4 1.1 


60.7 129.7 218.7 


2 22771 


3.7 438.3 523.0 


0.1 2.3 3.8 


149.2 535.8 698.4 


0.4 0.5 2.0 


57.3 137.7 345.6 


3 33412 


362.9 432.5 516.4 


1.0 1.8 2.7 


451.6 582.1 730.4 


0.2 0.3 0.5 


105.9 242.0 308.0 


4 39476 


266.1 332.5 396.2 


2.3 3.1 4.1 


219.4 288.6 418.7 


0.1 0.2 0.4 


61.6 169.8 240.2 


5 21767 


0.6 274.7 339.2 


0.8 3.6 4.5 


130.6 232.2 362.9 


0.1 0.2 0.8 


55.2 137.7 207.7 


6 26284 


293.4 398.5 494.7 


1.1 2.6 3.6 


278.8 372.5 594.3 


0.0 0.2 0.3 


56.0 138.0 225.7 


7 40229 


172.1 350.9 432.9 


0.0 2.4 3.1 


55.4 391.6 519.9 


0.1 0.3 0.6 


149.5 218.5 336.8 


8 43078 


225.2 389.2 449.0 


0.2 1.9 2.7 


218.0 477.2 619.1 


0.1 0.3 0.5 


168.9 250.6 331.6 


9 39616 


186.6 395.1 467.8 


0.0 1.1 1.8 


315.4 633.8 856.6 


0.1 0.3 0.7 


213.3 291.2 371.9 



4 Conclusions 

This work presents a new method for finding and classifying SUSY models that can be po- 
tentially discovered in an accelerator experiment, here LHC experiments. The method uses 
an adaptive MCMC algorithm to find interesting models and uses a clustering algorithm to 
classify the models according to phenomenology. The likelihood map is constructed using 
an extendible tool chain that incorporates recent limits from multiple sources through the 
SLHA interface. As an example and test of the method the highly constrained CMSSM 
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Table 6: CMSSM parameters, relict density, 8 TeV CMS LHC production cross-section 
and the number of expected events with r leptons for the centroids of clusters. All models 
have sign/i > and mtop = 173 GeV. 



id 


mo [GeV] 


mi/2 [GeV] 


Ao [GeV] 


tan l3 




InP 




a[fb] 


1 


284.0 


614.4 


-1832.4 


14.5 


162.9 


-2.2 


0.1 


30.2 


2 


311.8 


795.7 


-2090.1 


13.2 


19.5 


-1.2 


0.1 


3.1 


3 


615.4 


793.4 


-2340.3 


26.4 


11.1 


-0.9 


0.1 


2.6 


4 


973.3 


919.7 


-3466.2 


30.2 


4.8 


-0.9 


0.1 


1.5 


5 


726.8 


915.2 


-3363.2 


22.9 


9.9 


-1.6 


0.1 


3.1 


6 


658.6 


812.2 


-2295.0 


28.7 


3.4 


-1.1 


0.1 


1.2 


7 


759.3 


833.7 


-2837.6 


28.0 


7.8 


-0.9 


0.1 


2.2 


8 


841.7 


903.3 


-2929.3 


29.9 


3.3 


-0.8 


0.1 


0.8 


9 


789.7 


983.9 


-2763.2 


28.7 


1.4 


-0.9 


0.1 


0.3 



Table 7 : Higgs and sparticles masses for the centroids of clusters. 



id 


rriho [GeV] 


rrii^ [GeV] 


m~g [GeV] 


m^o [GeV] 


rrif, [GeV] 


1 


123.3 


967.1 


2105 


408.2 


412.8 


2 


123.8 


1016 


2271 


444.5 


447.9 


3 


124.4 


844.9 


1923 


368.8 


372.2 


4 


124.9 


676.9 


1632 


307.9 


308.1 


5 


123.9 


1124 


1991 


384.8 


386.7 


6 


124.2 


896.1 


1975 


381.9 


382.7 


7 


124.9 


668 


1421 


262.8 


264.3 


8 


125.1 


1201 


2366 


466.7 


467.8 


9 


124.8 


879.9 


1867 


357.5 


360.4 



Table 8: The lightest Higgs branching fractions in relation to the SM value for the centroids 
of clusters. 



id 


J- H^ZZI J- H^ZZ 


j^SUSY frSM 

J- H^WW/ ^ H^WW 


T^SUSY frSM 


j^SUSY /-pSM 


T^SUSY frSM 


1 


1.03 


1.04 


1.02 


1.04 


1.04 


2 


1.03 


1.05 


1.03 


1.03 


1.03 


3 


1.03 


1.05 


1.03 


1.03 


1.03 


4 


1.03 


1.04 


1.03 


1.04 


1.04 


5 


1.03 


1.05 


1.03 


1.04 


1.04 


6 


1.04 


1.05 


1.03 


1.03 


1.03 


7 


1.03 


1.04 


1.03 


1.04 


1.04 


8 


1.04 


1.05 


1.03 


1.03 


1.03 


9 


1.04 


1.05 


1.03 


1.03 


1.03 
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Figure 6: Projections of clusters. Colors indicate to which cluster a model belongs. Black 
triangles indicate locations clusters centroids. 



parameter space has been searched for models that could potentially be discovered using 
2012 LHC with r-lepton based signatures. 

Although simplified models like CMSSM are severely constrained, we are still able to 
find regions fulfilling recent LHC bounds on Higgs mass and rare S-meson decays, and 
giving relic density in agreement with WMAP results. Nine reference (bench-mark) points 
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Figure 7: Projections of clusters. Colors indicate to which cluster a given model belongs. 



exhibiting different phenomenologies were found with use of a g-means clustering algorithm. 
These reference points can be used to optimize searches. This makes the method very 
attractive from an experimental point of view, and applying the method to other models 
and different signatures would be a natural extension of this work. 

The method has proven successful in finding and classifying SUSY models, but could still 
benefit from several extensions and improvements. More constraints could be added to 
the likelihoods and more advanced statistical analysis of the simulated data could be in- 
corporated. Another interesting prospect would be to include detector simulations using 
PGS [17] or DELPHES [16] to get somewhat more realistic estimates for the expected sig- 
nal, although we are skeptical about realism of such simulations. In addition, the MCMC 
algorithm constructed could still benefit from improvements to increase stability and effi- 
ciency, in addition to rigorous numerical testing and proofs of ergodicity. Finally a better 
distance measure for the clustering could allow for precise predictions of expected discovery 
potential. 

On the more experimental side we find some viable models lying in the tails of clusters 
formed based of phenomenological observables. These models exhibit either low jet activity 
in the central part of the detector, but produce high pt tau leptons, or have low number of 
high Pt jets (monojets) and low momentum taus. We have not investigated these models 
further yet, but it is clear that standard LHC SUSY searches assuming presence of high pt 
jets for triggering purpose might fail for such models. 
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5 Appendix: Scan Implementation, clustering and cross-checks 

The implementation of the scan was written in Python . It was capable of running multiple 
MCMC chains in parallel. The scan was initiated using a random sample containing roughly 
100 points to find an estimate for the proposal distribution. The initial points were required 
to pass all discrete constraints, have parameter values close to the central observational 
values for r(6 5 + 7), rriho and while having an expected number of produced r 

leptons, {Nr) > 1. The chosen values are shown in figure 9. 

Table 9: Experimental constraints used in the initial sampling. 



Constraints 


Range 


r(5-^ s + 7) 
mho 

{Nr) 


[0,0.2] 
[3, 4] • 10-4 
[122,128] GeV 
[1,^) 



Five clusters were established from the initial parameter space points with the k-means 
algorithm. This number was found to be sufficient to give a reasonable approximation of 
the likelihood distribution of the sample. From the initial sample, ten chains were initiated 
with two chains starting from each cluster. Before sampling started, each chain was required 
to reach a minimum likelihood to be included in the sample. This was chosen to be 2a 
away from the central value for r(6 5 + 7), rriho^ ^h? and {N^) — 1 corresponding to 
the likelihood, InPmin ^ — 3 • | — 0.5 = —6.5. 

The proposal distribution was updated at intervals AA^ = 10000 steps by adding the new 
sample points and recalculating cluster means and covariances. The new points were added 



-19- 



without weights since they were aheady a product of weighted samphng. For practical 
purposes an upper size on the clustering sample was set to be 10000 unique points, which 
was found to be sufficient to give a good proposal estimate. The search chains were run 
in parallel on ten cores for roughly 500 hours, resulting in a sample size of = 2896332, 
corresponding to 284496 unique models. From this sample 3938 outliers (corresponding to 
361 unique models) with log-likelihood In P < —6.5 were removed. 

In order to illustrate the effects of the different experimental and theoretical constraints, low 
energy properties were calculated for 300 000 models sampled uniformly within the search 
range. The computationally expensive Pythia simulations where not done for these models 
and a looser relic density constraint compared to the one used for MCMC initialization 
was used to get sufficient data to describe the qualitative features of the constraint. 

As a cross-check with the vast literature on the subject (see for example [34, 47, 48]) we 
briefly describe the effects of the most important constraints by visualizing how the initial 
selections affect the model density. The effects are illustrated in figure 8. We observe, in 
agreement with results of [15, 47] that: 

• Theoretical constraints 

Theoretical constraints remove the low mg and mi/2 regions primarily avoiding a 
fi-LSP and tachyonic sparticles. The excluded regions becomes larger at large tan j3 
and 1, and for large values of a considerable part of the low mass regions, 
^0,^1/2 ^ 1000 GeV gives tachyons. 

• Higgs mass niho ^ [122, 128] 

Requiring a 125 GeV Higgs mass, with positive fi excludes all positive values of 
within the selected mg, For large negative values of however, the 

t-loop corrections to the Higgs mass become large. This is the main reason for the 
asymmetry in A^ seen in 3. At large values of mi/2 and mo the FeynHiggs calculations 
of the Higgs mass corrections become inaccurate and thus we excluded these regions. 

• Relic Density Jlh^ < 1 

As is well known, the relic density Dark Matter in CMSSM is generally orders of 
magnitude larger than allowed by WMAP and PLANCK results [48-50] , apart from 
special regions where the relic density is suppressed by resonant neutralino annihila- 
tion or co-annihilation cross-sections. The low mi/2 region where m~o < 10, the relic 
density is mainly suppressed through x-annihilation to fermions through sfermion ex- 
change (low mo), and to VF, Z pairs (high mo). This region is excluded primarily by 
Higgs mass requirements. Along the Xi-LSP boundary the relic density is reduced by 
X — T-coannihilation, since the coannihilation cross-section is significantly enhanced 
due to mass degeneracy between the lightest stau and the lightest neutralino. The 
middle region in the mass plane, the well known Higgs funnel [51], corresponds to high 
tan P models with m~o ^ l/2m^o^^o, giving an increase in % — X-^^i^^ihilation through 
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heavy neutral higgs bosons, {H^,A^). The preference for Aq arises mainly from 
the fact that large parts of the low mi/2 regions exhibit charged LSP or tachyonic 
particles for large values of |Ao|, as off-diagonal terms in the third generation sfermion 
mass matrices grow with \Aq\. The preference for high tan/3 is in part due to the 
additional relic density suppression through the Higgs channel x-annihilation. 

• Rare Decays r(Bs ^ fifi) < 4.5 • 10"^, r(b ^ s + 7) G [3, 4] • 10"^ 

Of the constraints on decays, Bg /J^fJ^ poses the most stringent one, as the SUSY 
contribution grows like tan/3^. This branching fraction tends to get too large at low 
values of mO and mi/2- The size of the excluded area in the mass plane increases 
with increasing tan/3 and decreasing \Aq\. F (6 ^ 5 + 7) is generally too low compared 
to the central experimental value of 3.55 • 10~^ and excludes large parts of the low 
mi/2 ^ 500 range, stretching as far as mo ^ 2000 for high values of tan/3 and low 
\Ao\. Too high T{Bs ijlij) and too low V{h ^5 + 7), together with the requirement 
of non-tachyonic sparticles constrains the lowest allowed values of mi/2- 





\ [GeV] 

Theoretical constraints 




i 



|50| 
40 g. 



Th.C+m^o 



Th.C+Q/i^ 





\ [GeV] 



Figure 8: 2D-histograms in mo,mi/2 ^^^d Aq, tan^S-planes showing the effects of different 
constraints. The constraints used corresponds to requirements chosen for the initial sample 
given in table 9 



The properties of selected high likelihood models are presented in the results section, 3.1. 
The relatively wide range of values for SUSY masses and values of tan /3 for the selected 
models lead to a wide range of values of phenomenological properties such as average 
^7^, the average missing energy per SUSY event, Pt{^i)^Pt{]^^i)^ the average pT of the 
leading r and the leading jet, and n^-, njet, the average number of r's/jets per SUSY event, 
see 4. In order to construct reference models that cover these different phenomenological 
properties the sample was clustered according to the phenomenological observables listed 
above. In order to avoid bias from the scale of the different variables, each variable x is first 
transformed as x' — (xi — x)/ax so that the mean = and variance a^, = 1. Because 
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the non-Gaussian nature of clusters the g-means algorithm often fail and split too often. 
To remedy this the constraining parameters mentioned in section 2.4 are used. By setting 
an approximate maximum number of possible clusters nmax, one gets minp = [A/'oK/^max] 
for the minimum number of points in a cluster and min^ = riog2 ^maxl for the maximal 
splitting depth. The maximal number of iterations per split attempt was set to max^ = 10. 
Here the number is chosen to be well above the final number of clusters but low enough, for 
this case nmax = 40 was found to be appropriate. The minimal cluster distance parameter 
was set to min^ = 1.3. The optimization was run riavg = 5 times and an average of 8.8 
clusters were found. Thus, one of the results with nine clusters was picked at random. 
The properties of models at the centroids of these clusters, which can be seen as reference 
models for search optimization, are presented in the results section 3.1. 
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